Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
arxiv; 2023.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2307.09580v1

ABSTRACT

The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment module. This extension substantially improves alignment quality, which in turn benefits secondary structure prediction quality, confirmed over a diverse set of RNA families. LinearSankoff also applies beam search heuristics and the A$^\star$-like algorithm to achieve that runtime scales linearly with sequence length. LinearSankoff is the first linear-time algorithm for simultaneous folding and alignment, and the first such algorithm to scale to coronavirus genomes (n $\approx$ 30,000nt). It only takes 10 minutes for a pair of SARS-CoV-2 and SARS-related genomes, and outperforms previous work at identifying crucial conserved structures between the two genomes.

2.
researchsquare; 2021.
Preprint in English | PREPRINT-RESEARCHSQUARE | ID: ppzbmed-10.21203.rs.3.rs-642805.v1

ABSTRACT

CTSL is one of the SARS-entry-associated CoV-2's proteases and plays a key role in the virus's entry into the cell and subsequent infection. We investigated the association between the expression level of CTSL and overall survival in TCGA and CGGA databases, in order to better understand the possible route and risks of new coronavirus infection for patients with GBM. Meanwhile, the relationship between CTSL and immune infiltration levels was analyzed by means of the TIMER database. The impact of CTSL inhibitors on GBM biological activity was tested. The findings revealed that GBM tissues had higher CTSL expression levels than that of normal brain tissues, which was associated with a significantly lower survival rate in GBM patients. Meanwhile, CTSL was very negatively correlated with purity, B cell and CD8+ T cell in GBM. CTSL inhibitor significantly reduced U251 cell growth and invasion in vitro and induced mitochondrial apoptosis. According to the findings of this study, CTSL acts as an independent prognostic factor and can be considered as promising therapeutic target for GBM.


Subject(s)
Coronavirus Infections
3.
4.
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.12.29.424617

ABSTRACT

Many RNAs fold into multiple structures at equilibrium. The classical stochastic sampling algorithm can sample secondary structures according to their probabilities in the Boltzmann ensemble, and is widely used. However, this algorithm, consisting of a bottom-up partition function phase followed by a top-down sampling phase, suffers from three limitations: (a) the formulation and implementation of the sampling phase are unnecessarily complicated; (b) the sampling phase repeatedly recalculates many redundant recursions already done during the partition function phase; (c) the partition function runtime scales cubically with the sequence length. These issues prevent stochastic sampling from being used for very long RNAs such as the full genomes of SARS-CoV-2. To address these problems, we first adopt a hypergraph framework under which the sampling algorithm can be greatly simplified. We then present three sampling algorithms under this framework, among which the LazySampling algorithm is the fastest by eliminating redundant work in the sampling phase via on-demand caching. Based on LazySampling, we further replace the cubic-time partition function by a linear-time approximate one, and derive LinearSampling, an end-to-end linear-time sampling algorithm that is orders of magnitude faster than the standard one. For instance, LinearSampling is 176× faster (38.9s vs. 1.9h) than Vienna RNAsubopt on the full genome of Ebola virus (18,959 nt ). More importantly, LinearSampling is the first RNA structure sampling algorithm to scale up to the full-genome of SARS-CoV-2 without local window constraints, taking only 69.2 seconds on its reference sequence (29,903 nt ). The resulting sample correlates well with the experimentally-guided structures. On the SARS-CoV-2 genome, LinearSampling finds 23 regions of 15 nt with high accessibilities, which are potential targets for COVID-19 diagnostics and drug design. See code: https://github.com/LinearFold/LinearSampling


Subject(s)
COVID-19 , Hemorrhagic Fever, Ebola
5.
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.11.23.393488

ABSTRACT

Many functional RNA structures are conserved across evolution, and such conserved structures provide critical targets for diagnostics and treatment. TurboFold II is a state-of-the-art software that can predict conserved structures and alignments given homologous sequences, but its cubic runtime and quadratic memory usage with sequence length prevent it from being applied to most full-length viral genomes. As the COVID-19 outbreak spreads, there is a growing need to have a fast and accurate tool to identify conserved regions of SARS-CoV-2. To address this issue, we present LinearTurboFold, which successfully accelerates TurboFold II without sacrificing accuracy on secondary structure and multiple sequence alignment prediction. LinearTurboFold is orders of magnitude faster than TurboFold II, e.g., 372 times faster (12 minutes vs. 3.1 days) on a group of five HIV-1 homologs with average length 9,686 nt. LinearTurboFold is able to scale up to the full sequence of SARS-CoV-2, and identifies conserved structures that have been supported by previous studies. Additionally, LinearTurboFold finds a list of novel conserved regions, including long-range base pairs, which may be useful for better understanding the virus.


Subject(s)
COVID-19
6.
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.05.01.071050

ABSTRACT

SummaryCOVID-19 has become a global pandemic not long after its inception in late 2019. SARS-CoV-2 genomes are being sequenced and shared on public repositories at a fast pace. To keep up with these updates, scientists need to frequently refresh and reclean datasets, which is ad hoc and labor-intensive. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes. In order to address these challenges, we developed CoV-Seq, a webserver to enable simple and rapid analysis of SARS-CoV-2 genomes. Given a new sequence, CoV-Seq automatically predicts gene boundaries and identifies genetic variants, which are presented in an interactive genome visualizer and are downloadable for further analysis. A command-line interface is also available for high-throughput processing. Availability and ImplementationCoV-Seq is implemented in Python and Javascript. The webserver is available at http://covseq.baidu.com/ and the source code is available from https://github.com/boxiangliu/covseq. Contactjollier.liu@gmail.com Supplementary informationSupplementary information are available at bioRxiv online.


Subject(s)
COVID-19
7.
arxiv; 2020.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2004.10177v4

ABSTRACT

A messenger RNA (mRNA) vaccine has emerged as a promising direction to combat the current COVID-19 pandemic. This requires an mRNA sequence that is stable and highly productive in protein expression, features which have been shown to benefit from greater mRNA secondary structure folding stability and optimal codon usage. However, sequence design remains a hard problem due to the exponentially many synonymous mRNA sequences that encode the same protein. We show that this design problem can be reduced to a classical problem in formal language theory and computational linguistics that can be solved in O(n^3) time, where n is the mRNA sequence length. This algorithm could still be too slow for large n (e.g., n = 3, 822 nucleotides for the spike protein of SARS-CoV-2), so we further developed a linear-time approximate version, LinearDesign, inspired by our recent work, LinearFold. This algorithm, LinearDesign, can compute the approximate minimum free energy mRNA sequence for this spike protein in just 11 minutes using beam size b = 1, 000, with only 0.6% loss in free energy change compared to exact search (i.e., b = +infinity, which costs 1 hour). We also develop two algorithms for incorporating the codon optimality into the design, one based on k-best parsing to find alternative sequences and one directly incorporating codon optimality into the dynamic programming. Our work provides efficient computational tools to speed up and improve mRNA vaccine development.


Subject(s)
COVID-19
8.
Aging (Albany NY) ; 12(7): 6037-6048, 2020 04 10.
Article in English | MEDLINE | ID: covidwho-45873

ABSTRACT

OBJECTIVE: This study aimed to investigate the potential parameters associated with imaging progression on chest CT from coronavirus disease 19 (COVID-19) patients. RESULTS: The average age of 273 COVID-19 patients enrolled with imaging progression were older than those without imaging progression (p = 0.006). The white blood cells, platelets, neutrophils and acid glycoprotein were all decreased in imaging progression patients (all p < 0.05), and monocytes were increased (p = 0.025). The parameters including homocysteine, urea, creatinine and serum cystatin C were significantly higher in imaging progression patients (all p < 0.05), while eGFR decreased (p < 0.001). Monocyte-lymphocyte ratio (MLR) was significantly higher in imaging progression patients compared to that in imaging progression-free ones (p < 0.001). Logistic models revealed that age, MLR, homocysteine and period from onset to admission were factors for predicting imaging progression on chest CT at first week from COVID-19 patients (all p < 0.05). CONCLUSION: Age, MLR, homocysteine and period from onset to admission could predict imaging progression on chest CT from COVID-19 patients. METHODS: The primary outcome was imaging progression on chest CT. Baseline parameters were collected at the first day of admission. Imaging manifestations on chest CT were followed-up at (6±1) days.


Subject(s)
Coronavirus Infections/diagnostic imaging , Coronavirus Infections/pathology , Pneumonia, Viral/diagnostic imaging , Pneumonia, Viral/pathology , COVID-19 , Coronavirus Infections/virology , Disease Progression , Female , Humans , Male , Middle Aged , Pandemics , Pneumonia, Viral/virology , Thorax/diagnostic imaging , Thorax/virology , Tomography, X-Ray Computed
9.
preprints.org; 2020.
Preprint in English | PREPRINT-PREPRINTS.ORG | ID: ppzbmed-10.20944.preprints202002.0167.v1

ABSTRACT

The outbreak of the 2019 Novel Coronavirus (2019-nCoV) has rapidly spread from Wuhan, China to multiple countries, causing staggering number of infections and deaths. A systematic profiling of the immune vulnerability landscape of 2019-nCoV is lacking, which can bring critical insights into the immune clearance mechanism, peptide vaccine development, and antiviral antibody development. In this study, we predicted the potential of all the 2019-nCoV viral proteins to induce class I and II MHC presentation and form linear antibody epitopes. We showed that the enrichment for T cell and B cell epitopes is not uniform on the viral genome, with several focused regions that generate abundant epitopes and may be more targetable. We showed that genetic variations in 2019-nCoV, though fewer for the moment, already follow the pattern of mutations in related coronaviruses, and could alter the immune vulnerability landscape of this virus, which should be considered in the development of therapies. We create an online database to broadly share our research outcome. Overall, we present an immunological resource for 2019-nCoV that could significantly promote both therapeutic development and mechanistic research.


Subject(s)
Death
10.
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.02.08.939553

ABSTRACT

The outbreak of the 2019 Novel Coronavirus (SARS-CoV-2) rapidly spread from Wuhan, China to more than 150 countries, areas or territories, causing staggering number of infections and deaths. A systematic profiling of the immune vulnerability landscape of SARS-CoV-2, which can bring critical insights into the immune clearance mechanism, peptide vaccine development, and antiviral antibody development, is lacking. In this study, we investigated the potential of the SARS-CoV-2 viral proteins to induce class I and II MHC presentation and to form linear antibody epitopes. We created an online database to broadly share the predictions as a resource for the research community. Using this resource, we showed that genetic variations in SARS- CoV-2, though still few for the moment, already follow the pattern of mutations in related coronaviruses, and could alter the immune vulnerability landscape of this virus. Importantly, we discovered evidence that SARS-CoV-2, along with related coronaviruses, used mutations to evade attack from the human immune system. Overall, we present an immunological resource for SARS-CoV-2 that could promote both therapeutic development and mechanistic research.


Subject(s)
Death
SELECTION OF CITATIONS
SEARCH DETAIL